BDSI 2021; University of Michigan

The basics

When to use

  • Reports

  • Slides

  • Manuscripts / books

Why to use

  • R code and interpretations integrated into a single document

  • Separate tasks of reporting the results from formatting the results:

    • decreases risk of copy-paste errors

    • decreases workload

  • Quickly create the same document in different formats, e.g. slides to show and handouts for the audience

  • Create websites

whatever format you want to create: html, pdf, docx, …

pandoc: “an open-source document converter” (wikipedia). Translates markup from one type of format, e.g. markdown, to another

md: a document written in markdown, “a lightweight markup language with plain text formatting syntax” (wikipedia). Github also uses markdown.

knitr: an R package for creating reports directly in R. Will translate your R markdown document (.Rmd), including embedded R code, to a plain markdown document

.Rmd: file type recognized by Rstudio. This is where everything goes: your header, R code chunks, and your content written in markdown

From RStudio, go to File > New File > R Markdown...

Choose your document type

Get a template

“YAML” Header

Write R code in chunks

Write plain text

Knit your document to see the final product

Try it out: Option 1

Try it out: Option 2

Your turn

08:00

Takeaways

  • Chunk options control how the chunk is evaluated and used
  • You can knit the same document to different formats (sometimes easy to do, sometimes requires a bit of finagling)
  • Consider using in-line chunks instead of hard-coding results

Use Markdown to tell your story

If you name a variable in an earlier code chunk, you can use it again in a later chunk.

early code chunk

x <- rnorm(20);
y <- 3 * x + rnorm(length(x));
foo = tibble(x = x, y = y);

later code chunk

library(ggplot2)
ggplot(data = foo) + 
  geom_point(aes(x, y));

Tables

foo;
## # A tibble: 20 x 2
##          x      y
##      <dbl>  <dbl>
##  1  0.390   0.121
##  2 -0.965  -2.83 
##  3  1.04    3.63 
##  4  0.125  -0.399
##  5  0.170   0.965
##  6  1.56    3.62 
##  7 -0.825  -2.56 
##  8 -1.25   -2.87 
##  9  0.555   2.53 
## 10 -0.112  -0.698
## 11 -0.429  -1.56 
## 12  0.0366  0.804
## 13  1.17    3.20 
## 14 -0.506  -2.86 
## 15  0.314  -0.244
## 16  2.18    6.35 
## 17 -0.599  -0.231
## 18 -1.96   -5.88 
## 19  0.292   0.546
## 20 -0.873  -2.07

Tables using ‘kable’

x y
0.38978 0.12069
-0.96466 -2.83252
1.04329 3.62949
0.12483 -0.39871
0.16966 0.96463
1.56343 3.62120
-0.82484 -2.55974
-1.25424 -2.86593
0.55485 2.52962
-0.11196 -0.69774
-0.42864 -1.55763
0.03657 0.80367
1.17391 3.19856
-0.50556 -2.85799
0.31370 -0.24402
2.17790 6.35158
-0.59893 -0.23119
-1.96191 -5.88256
0.29225 0.54614
-0.87258 -2.06756

Other Markdown basics

  • Use #, ##, ###, etc to indicate deeper layers of a header

  • Use *, + for bulleted (unordered) lists

  • Use (i), (a), or 1. for ordered lists

  • Use *{text}* for italics, **{text}** for bold

Random lessons I’ve learned

Markdown can be really, really finicky about horizontal and vertical spacing

If something (a new header option, a code chunk, etc) is not working as you expect, try adding an additional linebreak

If experimenting with a new feature, re-knit frequently

Caching

If, like me, you become a compulsive re-knitter, the code chunk option cache = TRUE is both useful and dangerous.

```{r, cache = TRUE}

(some intensive task)

```

As long as you don’t change anything in the chunk, you won’t need to re-run the intensive task upon re-knitting. However, things can go awry…

  • Open the file caching_mishap.Rmd and make sure you understand the intended behavior (should be trivial!)

  • Knit the document

  • Now edit your first chunk, changing to x <- rnorm(n = 1, mean = 100) and leaving the second chunk alone

  • Re-knit your document

That’s how we get results like this:

x <- rnorm(n = 1, mean = 100);
x;
## [1] 2.7449

What happened

We triggered a recache of the first chunk without triggering a reache of the second

Possible solutions

  • Cache with caution and only cache costly chunks

  • Think about when and where you want to split your chunk

  • For chunks that may be susceptible, trigger a re-cache by adding a comment character (#) at the end of a line, or making some other innocuous change to your chunk. Even extra white space will trigger a re-cache

  • Go to Knit > Clear Knitr Cache… or delete directly the folder ending in [filename]_cache in your working directory

knitr can run code in other languages

Including

  • Python

  • SQL

  • Julia

  • Stan

  • Javascript

Use ```{python} to start a python code chunk, ```{julia} to start a julia code chunk, ```{bash} to start a Shell script, etc.

You may need external language engines to successfully call other languages. I have not used this functionality before.

see Chapter 2.7, R Markdown: The Definitive Guide

More practice

You can knit R scripts!

You are not limited to using Markdown in Rmd files – you can knit R scripts using the same shortcut: Cmd+Shift+K / Ctrl+Shift+K

  • Use #' to indicate a switch to markdown

  • Use #+ to start a new chunk

Your turn again

Open 02-exercise.R and complete the tasks. Indicate when you are done.

08:00

Data analyses in R

readr package

readr gives you tools to read in data from files outside R, wrangled and manipulated, and then written to files outside R:

The workhorse of the readr package is read_csv, which reads a comma-separated value (csv) file into R as a data.frame From the help page:

read_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), 
na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, 
skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress(), 
skip_empty_rows = TRUE)

Typical use is my_data <- read_csv("my_files_path.csv")

Mouse xenograft study

  • \(n=37\) mice implanted with human tumor
  • Randomized to one of three treatment groups (radiation only; drug only; or both drug and radiation) or no treatment
  • Each tumor on each mouse measured daily for up to 4 weeks
  • Available at American Statistical Association’s Section on Teaching of Statistics in the Health Sciences (TSHS) data portal
  • File is called tumor_growth.csv

Varna M, Bertheau P, Legres LG. Tumor Microenvironment in Human Tumor Xenografted Mouse Models. Journal of Analytical Oncology 2014; 3(3): 159-166.

(tumor_growth <- read_csv("tumor_growth.csv"))
## # A tibble: 574 x 5
##    Grp   Group    ID   Day   Size
##    <chr> <dbl> <dbl> <dbl>  <dbl>
##  1 1.CTR     1   101     0   41.8
##  2 1.CTR     1   101     3   85  
##  3 1.CTR     1   101     4  114  
##  4 1.CTR     1   101     5  162. 
##  5 1.CTR     1   101     6  178. 
##  6 1.CTR     1   101     7  325  
##  7 1.CTR     1   101    10  624. 
##  8 1.CTR     1   101    11  648. 
##  9 1.CTR     1   101    12  836. 
## 10 1.CTR     1   101    13 1030. 
## # … with 564 more rows

Digression: testing your dplyr knowledge

tumor_growth %>% 
  filter(Day %in% c(0, 14)) %>%
  group_by(Grp, Day) %>%
  summarize(mean_Size = mean(Size))

Digression: testing your dplyr knowledge

tumor_growth %>% 
  filter(Day %in% c(0, 7, 14)) %>%
  group_by(Grp, Day) %>%
  summarize(mean_Size = mean(Size),
            sd_Size = sd(Size))

Digression: testing your dplyr knowledge

tumor_growth %>% 
  filter(Grp == "1.CTR") %>%
  group_by(ID) %>% 
  summarize(n = n()) %>% 
  summarize(n = mean(n)) %>% 
  pull(n) # pull

What to do next

References